Word Searching in CCITT Group 4 Compressed Document Images

نویسندگان

  • Yue Lu
  • Chew Lim Tan
چکیده

In this paper, we present a compressed pattern matching method for searching user queried words in the CCITT Group 4 compressed document images, without decompressing. The feature pixels composed of black changing elements and white changing elements are extracted directly from the CCITT Group 4 compressed document images. The connected components are labeled based on a line-by-line strategy according to the relative positions between the changing elements of the current coding line and the changing elements of the reference line. Word boxes are bounded by merging the connected components. A two-stage matching strategy is constructed to measure the dissimilarity between the template image of the user’s query word and the words extracted from document images. Experimental results confirmed the validity of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Searching in Compressed Document Images

A huge amount of document images are accessible in the Internet and digital libraries. We find that, most of them are packed in PDF files and are compressed using CCITT Group 4 standards for saving storage space and speeding up transmission. There is thus significant meaning to develop the methods of directly searching keywords from these documents. In this paper, we present a compressed patter...

متن کامل

Similarity measure for CCITT Group 4 compressed document images

Similarity measure of document images acts a crucial role in the area of document image retrieval. A method of measuring the similarity of CCITT Group 4 compressed document images is proposed in this paper. The features are extracted directly from the changing elements of the compressed images. Weighted Hausdorff distance is utilized to assign all of the word objects from two document images to...

متن کامل

Document retrieval from compressed images

With the emergence of digital libraries, more and more documents are stored and transmitted through the Internet in the format of compressed images. It is of signi/cant meaning to develop a system which is capable of retrieving documents from these compressed document images. Aiming at the popular compression standard-CCITT Group 4 which is widely used for compressing document images, we presen...

متن کامل

Document matching on CCITT Group 4 compressed images

A method is proposed for detecting whether two CCITT group 4 images were scanned from the same document. Features are extracted from rectangular patches of text and compared with a modified Hausdorff distance measure. Two images are said to be ‘‘equivalent’’ (i.e., they were scanned from the same document) if the Hausdorff measure finds that a specified number of features are located within a g...

متن کامل

Rapid Manipulation of Images Compressed by the Ccitt Group Iii 1-d Coding Scheme

The problem of performing operations on images compressed according to the 1-D CCITT Group III coding scheme is addressed. It is motivated by the fact that, intuitively, manipulating compressed information should result in faster execution and better utilization of memory and I/O resources. A formal model of the problem is sketched. Point operations are shown to be achieved by an attributed nit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003